# Gerarden and Yang (2022): Data and code for replication

This replication package contains publicly available data and all code for ''Using Targeting to Optimize Program Design: Evidence from an Energy Conservation Experiment'' by Todd Gerarden and Muxi Yang.

Data on energy consumption, household attributes, and program participation are proprietary and their publication is restricted by a non-disclosure agreement. Contact Muxi Yang (my458@cornell.edu) for instructions on how to request access to the the proprietary data.

Software used:
--------------------
MathWorks MATLAB R 2022a 
IBM ILOG CPLEX 22.1.0.0
Stata MP17
RStudio 2021.09.02 Build 382, R version 4.1.2 (w/ dependencies managed using the renv package)

Dataset construction: 
----------------------------
Run the following Stata do files to generate the estimation sample:
	1. ./code/build/build_1_import_raw_data.do
		- Import electricity consumption, treatment status by home energy report wave, and household demographics
		- All data are at the account level
	2. ./code/build/build_2_clean_data_elec.do
		- Clean and merge data
		- Output data for estimation

Analysis replication files:
---------------------------------
Run the list of five scripts in ./code/analyze to generate all figures and tables in the paper and the online appendix.
	1. analyze_1_tables.do 
	2. analyze_1_figures.do
	3. analyze_3_estimate_ewm_rules.m 
	4. analyze_4_summarize_ewm_rules.m
	5. coefficient_plots.R

The rest of this readme summarizes which scripts output which figures and tables.

1. Regression analysis of the average treatment effect and heterogeneous treatment effects on the estimation sample: 
	a) ./code/analyze/analyze_1_tables.do 
	    - Table 1: Number of accounts by HER wave
	    - Table 2: Summary statistics 
	    - Table 3: Average treatment effect 
	    - Table 4: Heterogenous treatment effects 
	    - Online Appendix Table A1: Summary of estimation sample 
	    - Online Appendix Table B1-B3: Balance of covariates 
	    - Online Appendix Table G1: Robustness check of average treatment effect 
	b) ./code/analyze/analyze_1_figures.do 
	    - Figure B1: Number of unique households and monthly electricity consumption, wave-specific
	    - Figure G1: Average electricity consumption by treatment, wave-specific 
	    - Figure G2: Event study plots, wave-specific  

2. Derive optimal treatment rules and estimate the gains associated with them using Matlab: 
	a) ./code/analyze/analyze_3_estimate_ewm_rules.m 
	    - This script implements a series of searches over candidate treatment rules to select the best ones. The key parameters are saved for tables and figures constructed in part b) below.
	    - The grid search and CPLEX LP code are adapted from replication code from "Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice" by Toru Kitagawa and Aleksey Tetenov
	    - Input: estimation sample
	    - This script calls the following functions: 
		- generate_cross_sample.m: collapse the pooled panel data to a cross-sectional sample 
		- calculate_benchmark_savings.m: calculate the average treatment effect and the average treatment effect on the treated 
		- generate_tables_covariate.m: derive quadrant and cubic EWM rules based on covariates. It calls the following two functions:
			- generate_results_quadrant_rule.m: use grid search to solve for the optimal quadrant rule, and quantify value gain. 
			- generate_results_cubic_rule.m: solve the linear programming problem with CPLEX optimizer to derive optimal linear rule with cubic terms, and quantify value gain 
		- generate_tables_baseline.m: this function derives quadrant and cubic EWM rules based on pre-treatment consumption data only. It calls the following three functions: 
			- generate_results_onedimension.m
			- generate_results_quadrant_rule.m
			- generate_results_cubic_rule.m
		- generate_delta_CI_bootstrap.m: this function estimates confidence interval of the value relative to the value of the original experiment. It calls the following two functions:
			- generate_delta_CI_bootstrap_quadrant.m
			- generate_delta_CI_bootstrap_cubic.m
		- generate_results_cross_waves.m: this function derives quadrant and cubic rules based on covariates, on the weighted sample wave. There are two pairs of sample waves and target waves: 3 - 6 and 6- 7. It calls the following two functions:
			- generate_results_quadrant_external_validity.m
			- generate_results_cubic_external_validity.m
		- the following two functions to perform out-of-sample exercise, with 100 runs:
			- generate_cv_samples_permutation.m: generate training and testing sets
			- generate_cv_results_permutation.m: estimate quadrant and cubic rules on the training set and evaluate on the testing set
		- the following functions to derive and evaluate EWM rules under budget constraints:
			- generate_results_capshare_quadrant_rule.m
			- generate_results_fixedshare_quadrant_rule.m
			- generate_results_capshare_onedimension.m
			- generate_results_fixedshare_onedimension.m
		- the following functions to estimate EWM rules and evaluate gains using the full sample on all observations with non-missing pre-treatment consumption data:
			- generate_cross_sample_baseline_fullsample.m
			- generate_tables_baseline.m
	b) ./code/analyze/analyze_4_summarize_ewm_rules.m
	    - Summarize gains from targeted treatment rules into tables and plot the rules 
	    - Input: various mat files saved previously 
	    - This script calls the following functions and produce the following outputs:
		- generate_ewm_results_table.m outputs:
			- Table G.2: net energy savings of EWM rules 
			- Table G.3: net private cost savings of EWM rules 
			- Table G.4: net social cost savings of EWM rules 
		- summarize_savings_graph_covariate.m outputs:
			- Figure C.1: Quadrant rule grid search heatmap 
		- summarize_plots_for_slides.m outputs:
			- Figure 4: EWM rules maximizing private cost savings, based on covariates
		- generate_ewm_results_table_pe_only.m outputs:
			- Tables summarizing the point estimates of EWM rules based on pre-treatment consumption data only
		- summarize_savings_graph_baseline.m outputs:
			- Figure G.4: EWM rules maximizing private cost savings based on pre-treatment consumption data only
		- generate_delta_CI_table.m outputs:
			- Tables summarizing point estimates and confidence intervals for EWM savings relative to RCT savings 
		- generate_external_validity_table_cubic_all_specifications.m outputs:
			- Table 5 (same as Table D.2): energy and cost savings from the cross waves analysis, rules applied on target wave Table D.1: energy and cost savings from the cross waves analysis, rules applied on sample wave
		- generate_cross_validation_permutation_100_table.m outputs:
			- Table D.3: out-of-sample performance of EWM rules on the pooled sample 
		- summarize_budget_constraint_table.m outputs:
			- Table E.1: performance of EWM rules under budget cap 
			- Table E.2: performance of EWM rules under fixed budget
		- summarize_inequality_analysis.m outputs:
	     		- Figure F.1: empirical CDF of income for treated households under various EWM
			- Figure F.2: distribution of racial diversity index 
			- Figure F.3: racial diversity index and treatment share under various EWM rules 
	c). The following utility scripts are used in various parts of the analysis: 
	     - grid search: empirical_density_generate_grid_onedimension.m, empirical_density_generate_grid.m, generate_inputs_cross_validation_onedimension.m, generate_inputs_cross_validation_quadrant.m, generate_inputs_quadrant_rule_cv.m, generate_inputs_quadrant_rule.m, generate_results_cross_validation_onedimension.m, generate_results_cross_validation_quadrant.m
	     - CPLEX LP: empirical_density_generate_gu_cubic.m, ewm_rct_test_cubic_input.m, generate_cplex_inputs_weight.m, generate_cplex_inputs.m, generate_inputs_cubic_rule_cv.m, generate_inputs_cubic_rule_reweight.m, generate_inputs_cubic_rule.m, generate_results_cross_validation_cubic.m
	     - facilitate parfor loop: generate_g_IPW.m, Y_demean.m
	     - pool and save bootstrap results: pool_BS.m, save_BS.m, pool_BS_delta_CI.m, save_BS_delta_CI.m 
	     - data type and data file conversions: matrix2latex.m, str2num_ci.m, table2latex.m, array2table_with_name.m

3. Plot estimates of gains from targeted treatment assignment using R:
	a) ./code/analyze/coef_plot/coefficient_plots.R:
		- dependencies managed using the renv package
		- input: previously saved estimation results in txt files
		- output:
			- Figure 2: energy savings of EWM rules based on demographics data
			- Figure 3: private cost savings of EWM rules based on demographics data
			- Figure 5: social cost savings of EWM rules based on demographics data
			- Figure 6: private cost savings of EWM rules based on demographics data and pre-treatment consumption data only
			- Figure G.3: energy savings and social cost savings of EWM rules based on demographics data and pre-treatment consumption data only
